Code
%store -r reddit_df
reddit_df = reddit_df
import pandas as pdThe compound score is a single, normalized number representing the overall sentiment of the text, ranging from -1 (most negative) to +1 (most positive). VADER calculates this by looking up each word’s sentiment (valence) in its dictionary, adjusting for things like capital letters (“GREAT”), punctuation (“!”), and modifiers (“very”). It then combines these individual scores and normalizes them to produce the final compound value.
This function finds the sentiment score of every single full post in the dataframe
Assign labels for each of the outputs of the VADER sentiment scores
Categorize the data into buckets of different COVID periods
Average Sentiment Scores by Year
compound positive negative neutral
year
2020 -0.1997 0.1113 0.1775 0.7112
2021 -0.1710 0.0979 0.1558 0.7463
2022 -0.1965 0.1066 0.1556 0.7378
2023 -0.2497 0.1114 0.1573 0.7313
2024 -0.2382 0.1056 0.1620 0.7324
2025 -0.2772 0.1026 0.1616 0.7358
Here, I’m grouping my sentiment data by year to see the long-term trends. I calculate the average for the compound, positive, negative, and neutral scores for each year.
The resulting table shows that the overall compound sentiment is consistently negative across all years, but it becomes even more negative from 2023 to 2025. This suggests that the language in these subreddits has grown more negative in the post-pandemic period compared to during the pandemic itself.
Sentiment by COVID Period
compound negative positive
mean std count mean mean
covid_period
During COVID -0.1967 0.6350 10517 0.1624 0.1049
Post-COVID -0.2485 0.7024 6218 0.1600 0.1073
Pre-COVID -0.1463 0.6532 576 0.1747 0.1142
This code groups all my posts into the ‘Pre-COVID’, ‘During COVID’, and ‘Post-COVID’ periods. For each period, it’s calculating the average compound, negative, and positive scores. It also gets the standard deviation (std) and post count for the main compound score to check for consistency and data volume.
This output clearly shows that the overall sentiment (compound score) became more negative during the pandemic (-0.1967) compared to the ‘Pre-COVID’ period (-0.1463). More importantly, the sentiment in the ‘Post-COVID’ period (-0.2485) is even more negative than it was during the pandemic. This suggests that the collective mental health discourse in these subreddits has not improved and has, in fact, trended further into negative territory.
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"
# Create the base line plot
fig = px.line(
yearly_sentiment,
x=yearly_sentiment.index,
y="compound",
markers=True, # Creates the 'o' markers
text="compound", # Use the 'compound' column for text labels
# Set titles and labels
title="Reddit Mental Health Sentiment Over Time",
labels={
"index": "Year", # 'index' because we passed the index as x
"compound": "Average Compound Sentiment Score",
},
)
# Add the vertical 'COVID Start' line
fig.add_vline(
x=2020,
line_dash="dash", # Replaces 'linestyle'
line_color="red", # Replaces 'color'
annotation_text="COVID Start", # This is Plotly's 'label'
annotation_position="top right",
)
# Format the text labels on the points
fig.update_traces(
textposition="bottom right", # Matches 'ha='center', va='bottom'
texttemplate="%{text:.3f}", # Formats the text like '{score:.3f}'
)
# Adjust layout size and grid (Plotly's grid is on by default)
fig.update_layout(
width=1000, # Roughly matches figsize=(10, 6)
height=600,
xaxis_gridcolor="rgba(0,0,0,0.1)", # Lighter grid, like 'alpha=0.3'
yaxis_gridcolor="rgba(0,0,0,0.1)",
)
fig.show()The data seems to show that the pandemic’s impact on mental health wasn’t a temporary event that ended when the lockdowns lifted. Instead, it appears to have introduced or amplified long term negative scores. While the acute anxiety of the virus and lockdowns faded, they were replaced by chronic issues like economic inflation, job instability, and the stress of “returning to normal”, which created a new set of anxieties as we’ll later explore in this dataset.
Furthermore, the prolonged social isolation may have caused lasting damage to social structures and individual well-being, leading to persistent loneliness and disconnection. The data suggests we are now living with the compounded consequences of the pandemic, which are proving to be just as, or even more, detrimental to collective mental health than the initial crisis itself.
# Check data distribution
print("Posts per year:")
print(reddit_df['year'].value_counts().sort_index())
# Compare pre vs during COVID (if you have that data)
print("\nAverage compound sentiment by period:")
for period in ['Pre-COVID', 'During COVID', 'Post-COVID']:
data = reddit_df[reddit_df['covid_period'] == period]
if len(data) > 0:
print(f"{period}: {data['compound'].mean():.4f} (n={len(data)})")Posts per year:
year
2020 3536
2021 3567
2022 3159
2023 2775
2024 2993
2025 1281
Name: count, dtype: int64
Average compound sentiment by period:
Pre-COVID: -0.1463 (n=576)
During COVID: -0.1967 (n=10517)
Post-COVID: -0.2485 (n=6218)
I have a solid collection of posts, with thousands from each year between 2020 and 2024. The lower number for 2025 just means I ran the data collection part-way through that year.
The average sentiment was already negative Pre-COVID (-0.1463) but became significantly more negative During COVID (-0.1967). Most importantly, instead of recovering, the sentiment has grown even more negative in the Post-COVID period (-0.2485), suggesting that the pandemic has had a lasting and worsening impact on the mental health discourse in these subreddits.